Text classification model for methamphetamine-related tweets in Southeast Asia using dual data preprocessing techniques
نویسندگان
چکیده
<span>Methamphetamine addiction is a prominent problem in Southeast Asia. Drug addicts often discuss illegal activities on popular social networking services. These individuals spread messages media as means of both buying and selling drugs online. This paper proposes model, the “text classification model methamphetamine tweets Asia” (TMTA), to identify whether tweet from Asia related abuse. The research addresses weakness bag words (BoW) by introducing BoW Word2Vec feature selection (BWF) techniques. A domain-based method was performed using dataset Word2Vec. BWF provided smaller number features than TF–IDF dataset. We experimented with three candidate classifiers: Support vector machine (SVM), decision tree (J48) naive bayes (NB). found that J48 classifier best performance for TMTA terms accuracy (0.815), F-measure (0.818), Kappa (0.528), Matthews correlation coefficient (0.529) high area under ROC Curve (0.763). Moreover, lowest runtime (3.480 seconds) dataset.</span>
منابع مشابه
the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
Evaluating preprocessing techniques in a Text Classification problem
Aiming to access the importance of the preprocessing phase on the text classification problem, we applied the Support Vector Machine paradigm to the Portuguese Attorney General’s Office dataset (written in the European Portuguese language) and the Reuters dataset. Searching for the best document representation, we evaluated and analysed some known feature reduction/construction, feature subset ...
متن کاملOptimising Sentiment Classification using Preprocessing Techniques
Sentiment Classification refers to the computational techniques for classifying whether the sentiments of text are positive or negative. Sentiment Classification being a specialized domain of text mining is expected to benefit after preprocessing. In this paper we propose various models with selective combinations of preprocessing techniques and Sentiment Classifiers, to optimize Sentiment Clas...
متن کاملData Preprocessing for Intrusion Detection System using Swarm Intelligence Techniques
Due to access of malicious data in internet, Intrusion detection system becomes an important element in system security that controls real time data and leads to huge dimensional problem, so a data pre-processing is necessary to reduce haziness and to clean network data. To reduce false positive rate and to increase efficiency of detection, the paper proposed a new swarm intelligence technique ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Power Electronics and Drive Systems
سال: 2021
ISSN: ['2722-2578', '2722-256X']
DOI: https://doi.org/10.11591/ijece.v11i4.pp3617-3628